Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft PR] Graphcore backend support. #1659

Open
wants to merge 77 commits into
base: main
Choose a base branch
from

Conversation

Sameeranjoshi
Copy link

This PR is a draft PR with lots of learnings from fpga.py and cuda.py backends.
The goal of the PR is to add graphcore backend to DaCE and map from DaCE graphs onto GraphCore graphs.
Helpful comments on parts I might be doing wrong would be super useful for next set of patches.

… from the documentation online to check what library is present
…page, this commit didn't work. They also have something similar - adding Tensor core backend from Nvidia as an external codegen
…f experimental changes which try to understand the SDFGIR and the changes to make IPUCodeGen registry into the frame_targets.
	The fix was to call 'self._frame.generate_state' recursively from ipu.generate_state. This goes into framecode.py and
	calls the recursive function which traverses substates and calls the codegen respectively.
	Based on this learning a point to note is to remember to call the recursive functions inside
	generate_*() functions.
	example is 'self._dispatcher.dispatch_subgraph' in 'generate_scope'.

2. Fix ipu/ipu 2 folders were created recursively.
	Fix - Remove 'target_type='ipu' from CodeObject.
…ers and the IPUDevice, this goes in __init__/exit part and not in the SDFG
Code snippet below checks if the file is frame(.cpp vs .cu) if so
matches target and dumps the headers.

```
if backend == 'frame':  #on cpu.cpp file
+            for target in self.targets:
+                if target.target_name == 'ipu':
```

Some cosmetic changes removed the prints.
… add more handcrafted tests, remove the header generation from framecode.py and add into ipu.py
2. In the Host/ side the pipeline + functions are present.
3. The pipeline is not yet set, next patch might set it.
4. Lots of bugs still
5. Cerete Host/ Device/ files along with cpu/
…t the inputs to the library and the inputs to SDFG, seems like we need to dig into some other ways of allocation of variables and such details."

Reverting this as this is making it hard to understand a node as of now.

This reverts commit 63fad92.
@Sameeranjoshi
Copy link
Author

The golden base code I plan to generate for IPU is here.

The current test I plan to support is (AccessNode + Library Node) - tests/library/poplar/poplar_matmul.py

…m fpga and cuda. This creates a codegen which dumps both library + nodes
Copy link
Collaborator

@ThrudPrimrose ThrudPrimrose Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using more descriptive names of the memory locations will be more beneficial for future users.
I do not know the memory hierarchy of Graphcore, but for now, let us assume that Graphcore has only Global memory and an on-chip L1 memory location then I would go with the names IPU_L1 and IPU_Global.

Using the naming can also be good if IPUs change their memory hierarchy across generations. Let's say they add a level called L2 -> Then it will be IPU_Memory and IPU_L2 (and in my opinion, this will be quite confusing)

Same for the schedule, IPU_Multicore can be better (this again depends on what GraphCore calls them).
For example:

    StorageType.IPU_L1: ScheduleType.IPU_Multicore,
    StorageType.IPU_Global: ScheduleType.IPU_Device

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are, type maps such as _CTYPES (C/C++) or _OCL_TYPES (OpenCL) in dtypes.py
I think having this map there with other type maps would be consistent with the previous implementations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Init code is for the part of the code the use library requires in initialization (MPI_Init) and finalize then the code needed for the finalization (MPI_finalize)

From what I see, engine.run(0) is similar to a Cuda kernel launch. Wouldn't it be better to generate this in a target/ipu.py?

If implemented like this, wouldn't there be problems if you need to call .run more than once?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to how graphcore requires you to use the poplar library, using it as a library is unavoidable, I want to point that you can still put as much as you can in ipu code-gen and ipu code-gen makes this library mandatory.

Good luck and success in your DaCe backend implementation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants